Guide - Using AI Telephone Bots
An AI Bot can be used as a voice agent or an advanced IVR system. It is possible to integrate into 3rd party systems as well as get it to perform telephony actions.
Contents
- Basic Configuration
- Tools
- Webhooks
- Variables, Secrets, and Session Variables
- Contexts
- Advanced Webhooks and Data Transformation
Basic Configuration
AI Bots are configured under console -> Stuff -> Add -> AI Bot.
Once a bot is created, it can be reached from a call rule. This document outlines the main items in an AI Bot config.
Configs are written in YAML. YAML has a very simple structure and is very expressive, allowing our AI to be very configurable. It would be wise to read an intro to YAML if you are not familiar with it. The section on webhooks also uses terms which should be familiar to users with knowledge of web APIs - if you intend to use webhooks, then we would recommend also brushing up your knowledge of those too.
Let's just dive in and start with a simple example.
description: >
You are an AI telephone bot for widgets ltd. Welcome the caller and find out their name.
initial: Thankyou for calling widgets limited, please can you tell me your name
language: en
That is as simple as it can be. If you create a call rule to run this AI Bot, it will say something like “thank you for calling Widgets Limited”. Who knows what else it will do, as there are no further instructions!
Note, the key initial (which is available in contexts and the root context, the root initial does not propagate into contexts) will be read out as the welcome message rather than an AI generated message - this can ensure certain things are said to the caller. This has the advantage of control, however the drawback is it becomes less human. To ensure it works well, the initial message should largely match the description so the conversation thread makes sense in order for the AI model to be able to continue the thread. Without it, the AI will generate the initial message.
Note, I have added language here as an example to force the language to English - this requires ISO 639 country codes. This is optional - but is a useful hint to some parts of the system such as speech to text.
So we probably want to build upon this.
description: >
You are an AI telephone bot. Start by welcoming the caller by thanking them for calling Widgets Limited. You goal is to figure out who it is best for the caller to speak with. If they want to speak to someone about sales then jump to extension 1000 all other requests jump to extension 1001.
Now, it actually has a goal. It will have a conversation with the caller in the aim of its goal, and at the end it will probably say jumping to extension 1000 or 1001 - but then do nothing. Why? As we all know, AI is evil. We have to control very carefully what we give it access to.
This is achieved by setting some permissions.
description: >
You are an AI telephone bot. Start by welcoming the caller by thanking them for calling Widgets Limited. Your goal is to figure out who it is best for the caller to speak with. If they want to speak to someone about sales then jump to extension 1000 all other requests jump to extension 1001.
tools:
jump_extension:
extensions:
- "1000"
- "1001"
We have given permission to the AI Bot to be able to jump to just 2 extensions, and we finally have a fully functional AI-driven Bot (agent). Nice! But, it is so basic - you could achieve this with a very simple auto-attendant built using call rules.
Tools
Before we go any further, let’s discuss the built-in tools.
tools:
send_sms:
destinations:
- 44776
send_sms_caller:
destinations:
- 44776
jump_extension:
extensions:
- "1000"
- "1001"
forward_call:
destinations:
- 0776
hangup: true
finish: true
This shows permissions we can set to access all the available built in tools.
- send_sms - send an sms to a number decided upon by the AI - the pattern has to match the pattern under the destinations field.
- send_sms_caller - send an sms back to the caller - the destination still has to match the destinations field.
- jump_extension - this will end the AI bot and pass the call to another (internal) extension (call rule or device) - the extensions field must match. You can also ask the AI to pass in a new caller id name and/or a note (which can be seen in babblevoice Desktop, note, the string are limited in length).
- forward_call - same as jump_extension but it sends the call outside of the system - ie to another phone number (this could be your mobile, for instance) the destinations field must match.
- hangup - when true allows the AI to end the call.
- finish - the call rule has a next param - when the AI decides to finish the AI (when this param is set to true) then the next part of the rule is followed. Same as jump_extension regarding name and note.
We can now get onto some advanced topics.
Webhooks
How do we integrate with 3rd party systems? Webhooks.
description: >
You are an AI telephone bot. Start fetching some appointments to offer the caller using the fetch appointments webhook.
webhooks:
fetch_appointments:
description: Fetches a list of available appointments a caller can book
url: http://myurl.co/appointments
method: GET
expect:
status: 200
content_type: application/json
Webhooks can be used to GET or POST information from other systems. Our goal is to make the configuration as flexible as possible so that we can form requests in the most common ways. We can post and fetch as URL-encoded or as JSON. In the above example, the structure of the appointments is not fixed, but we will pass what is sent to us and pass it back to the AI Bot for it to understand. As long as ambiguities are resolved (ie try to use unambiguous date/time formats - or explain the format in the description).
A little gotcha, content_type can appear in the webook and also under expect. In the webhook it is the header and packaging of the body of the request. In the expect section it is what the response should be. If items under expect do not match, then the webhook reports this to the AI to highlight the error. This often translates into the AI informing the user.
The webhook might return something like.
[ {
"uuid": "1234",
"date": "Wed Oct 05 2011 15:48:00 GMT+0100 (British Summer Time)"
} ]
Having put all of this together, we can finish writing the AI, which will allow a caller to make an appointment.
description: >
You are an AI telephone bot. Start asking the callers first and last name. Check any ambigous spellings. Then fetch some appointments to offer the caller using the fetch appointments webhook. Offer the appointments to the caller and ask them to choose so you can book an appointment for them.
webhooks:
fetch_appointments:
description: Fetches a list of available appointments a caller can book
url: http://myurl.co/appointments
method: GET
headers:
Authorization: Bearer ${{secret.webhookbearer}}
expect:
status: 200
content_type: application/json
book_appointment:
description: Book a specific appointment for the caller
url: http://myurl.co/appointments
method: POST
content_type: application/json
headers:
Authorization: Bearer ${{secret.webhookbearer}}
expect:
status: 200
fields:
id:
description: a unique id for this request
value: ${{ var.uuid }}
firstname:
description: the first name of the caller
type: string
lastname:
description: the last name of the caller
type: string
uuid:
description: the uuid taken from the appointment
type: string
Variables, Secrets, and Session Variables
All variable values are enclosed within ${{ }} (e.g., ${{ var.now }}).
Variables (vars): These values are derived from the ongoing call with the AI Bot and are addressed using var.. They can be injected into descriptions, webhook headers, or webhook post values.
Currently supported variables include:
- now: The date and time the value is inserted into the AI.
- uuid: A unique identifier for the call.
- callerid: The caller ID of the caller.
Session Variables (session): These values are stored by the AI when switching contexts. Refer to the "Contexts" section for further details.
Secrets: Secrets are managed within a domain-specific secrets manager. Each secret has a name, a value (e.g., a password), and a defined scope. The scope is a URL that specifies where the system is permitted to share the secret. Consequently, secrets can only be utilized by webhooks whose URLs match (fully or partially) the secret's scope. Secrets can be injected into a header or a value within the posted information, commonly used as a bearer token, as demonstrated in previous examples.
Contexts
A frequent challenge with AI is its tendency to lose focus on its primary task. As we aim to provide more nuanced instructions and facilitate back-and-forth interactions between the caller and the AI, the AI may begin to lose sight of its objectives (internally, its input will be cropped). To mitigate this, we offer the ability to define distinct contexts for different tasks, allowing each context to concentrate more acutely on its specific goals.
Take the below.
description: you are a telephone AI bot. Welcome the caller to our company Widgets Limited.
start: who_are_you
contexts:
who_are_you:
description: >
Collect the callers name and age. After you have done that find out if the caller
wants to let us know about colours or something else.
collect:
first_name:
description: the firstname of the caller
age:
description: the persons age
required: false
summary:
description: a summary of the conversation so far
contexts:
- another_context
- perhaps_another_context_with_purpose
another_context:
purpose: this is the overriding purpose of this context
description: do more stuff regarding ${{ session.summary }}
webhooks:
- mypostwebhook
tools:
hangup: true
perhaps_another_context_with_purpose:
purpose: find out what colours the caller likes
description: ask the caller what is their favourite colour, their second favourite and their third.
When working with AI contexts, keep the following points in mind:
- Core Description Persistence: The AI always retains a core description that is prepended to any other context. For example, the full description will be the core description combined with the specific context description.
- Context Switching: The AI can decide to switch contexts at any time, based on the instructions provided within its current context.
- Purpose-Driven Switching: If a context is permitted to switch to another (by inclusion in a designated list), the purpose of the new context is provided to the AI. This allows the AI to assess the reason for the change.
- Session Variables for Switching: When a context is allowed to switch, it can utilise a "collect" key to store session variables. These variables can then be accessed in the new context or even by a webhook.
- New Conversation Paradigm: Switching to a new AI context is akin to starting a fresh conversation. This is why session variables are crucial for maintaining continuity.
- Tool and Webhook Permissions: For the AI within a specific context to control calls or access webhooks, it requires a dedicated "tools" section and a list of authorised webhooks.
Some advanced points on webhooks and data transformation
We aim to be able to call as many web API URLs as possible. Some users will be able to build an endpoint which will match a simple definition in our webhook. But we also try to make a webhook as configurable as possible so that it should be able to simply drop into an existing API URL.
Encodings
Currently, we support
- application/x-www-form-urlencoded
- application/json
Our default is JSON, so this option can be left out. This value is set at the root of the webhook definition (ie the content type we are sending vs the same value in the expect key, which says what to expect in response).
Body
When we POST a document to a remote API (our webhook), we package the data as defined under the fields key using both the content type as defined before and also information provided in the field definition.
webhooks:
name: book_appointment
description: Book a specific appointment for the caller
url: http://myurl.co/appointments
method: POST
content_type: application/json
headers:
Authorization: Bearer ${{secret.webhookbearer}}
expect:
status: 200
fields:
id:
description: a unique id for this request
value: ${{ var.uuid }}
firstname:
path: contact.firstname
description: the first name of the caller
type: string
lastname:
path: contact.lastname
description: the last name of the caller
type: string
summary:
value: ${{ session.summary }}
uuid:
description: the uuid taken from the appointment
type: string
favourite_colour:
description: the callers favourite colour
type: string
required: false
data:
compute: >
messages|mappairs({
when: {
n: { role: 'assistant', content: 'present' },
n1: { role: 'user', content: 'present' }
},
map: {
question: 'n.content',
answer: 'n1.content',
type: 'qa'
}
})
This example has a host of different mechanisms used in packaging up data encoded as JSON. Lets go through them.
id: This example takes the value from a var, the AI will not inject anything - it comes directly from information about the phone call.
firstname and lastname: These 2 examples show how to use a path to set values within the JSON object. By default (without this value) the key firstname would be set in the root object of the JSON being packaged. With this value, it sets a path.
The below has 2 examples, the first without the path, the second with.
{ "firstname": "John", "lastname": "Doe" }
{ "contact": { "firstname": "John", "lastname": "Doe" } }
summary: This is another example of setting the value from a variable - but in this case, a session variable rather than a variable from the call. This session variable would have to have been set from a context switch during the AI at some point.
uuid: Similar to firstname and lastname - but with no path. The AI will decide what to provide for this value. Given the name of it it is likely to have been supplied by another webhook call.
favourite_colour: I added this to the example just to show the use of the required flag. The default of this is true. But there are some considerations in getting information into the webhook.
- It has to be defined in the webhook.
- Even if it is defined, the AI has to be given sufficient instructions on how to gather that information.
- The description in the field helps the AI tie data together, but in itself does not provide an instruction to the AI to gather that information.
- If required is true, when we call the webhook, we will supply something, even if it is just “”
- If required is false, and the AI has not supplied anything, we will omit the field altogether
data: This is a more complex structure. In some cases, we actually want to supply the transcript of the discussion to a webhook - and the source key performs this function. But we also have the ability to modify the structure that is posted to the URL. With the map function, we define how we want to present the data. Without out, you will receive an array in the ChatML style.
In the example, it looks through the array for “assistant”, “user” pairs and rewrites them into a new array which contains an object in the format:
{ "question": "tell me your first name", "answer": "John", "type": "qa" }
Form encoded
Some functionality is not used in form encoded - path, for example. This is a much simpler format - but is often used in form posting services (services you can open a simple account and have us post requests to capture data easily). Most of these services support form-encoded.
Example
name=John+Doe&email=john%40example.com&message=Hello+John%21
Compute & When
contexts:
start:
purpose: use me to collect basic information
description: collect the callers first and last name and date of birth
collect:
dob:
description: the callers date of birth in the format YYYY-MM-DD
age:
compute: yearssince( session.dob )
contexts:
- decidenext
decidenext:
purpose: this is used to understand what to do next
description: ask the caller what they would like from us
contexts:
- childhoodimms
childhoodimms:
purpose: the caller is asking for childhood imms
when: session.age < 12
webhooks:
ourwebhook:
fields:
data:
compute: yearssince( session.dob )
There are 2 new things happening in this example.
The age session variable is not being collected by the AI but is calculated from other values it has. We have instructed the AI to collect the date of birth of the caller, but we need to perform some maths on the date collected to calculate the age of the caller. We could ask the AI to perform this function, but AI is not always perfect at maths so we can pull this function out of the AI and into an actual function to calculate it. When a collect item has a description, the description is used to instruct the AI on how to populate that variable. Without the description, you can use compute to offer an expression to calculate the value.
The second thing is we are then using logic and not AI to decide if to include a context or not. In the above example, childhoodimms has the key ‘when’ this has to be true for it to be included in the previous context. Be careful, this test would be false in the ‘decidenext’ context as the age session value is only calculated once the AI has asked to switch context.